Signi cance tests for multi-component estimands from multiply imputed, synthetic microdata
نویسنده
چکیده
To limit the risks of disclosures when releasing data to the public, it has been suggested that statistical agencies release multiply imputed, synthetic microdata. For example, the released microdata can be fully synthetic, comprising random samples of units from the sampling frame with simulated values of variables. Or, the released microdata can be partially synthetic, comprising the units originally surveyed with some collected values, e.g. sensitive values at high risk of disclosure or values of key identi ers, replaced with multiple imputations. This article presents inferential methods for synthetic data for multi-component estimands, in particular procedures for Wald and likelihood ratio tests. The performance of the procedures is illustrated with simulation studies. c © 2004 Published by Elsevier B.V.
منابع مشابه
Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study
The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety of descriptive and analytic estimands, to assess the degree of protection of confidentiality that is afford...
متن کاملDistribution-Preserving Statistical Disclosure Limitation1
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. A mis-speci ed imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of s...
متن کاملDistribution-preserving statistical disclosure limitation
One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con dential data replaced by multiply-imputed synthetic values. A mis-speci ed imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate th...
متن کاملSynthetic Datasets for the German IAB Establishment Panel
Disseminating microdata to the public that provide a high level of data utility while at the same time guaranteeing the confidentiality of the survey respondent is a difficult task. Generating multiply imputed synthetic datasets is an innovative statistical disclosure limitation technique with the potential of enabling the data disseminating agency to achieve this twofold goal. So far, the appr...
متن کاملLikelihood Based Finite Sample Inference for Singly Imputed Synthetic Data Under the Multivariate Normal and Multiple Linear Regression Models
In this paper we develop likelihood-based finite sample inference based on singly imputed partially synthetic data, when the original data follow either a multivariate normal or a multiple linear regression model. We assume that the synthetic data are generated by using the plug-in sampling method, where unknown parameters in the data model are set equal to observed values of their point estima...
متن کامل